Exploring Antimicrobial Resistance (AMR) genes within wild and domestic animal populations ¶
Publication¶
Skarżyńska M, Leekitcharoenphon P, Hendriksen RS, Aarestrup FM, Wasyl D (2020) A metagenomic glimpse into the gut of wildand domestic animals: Quantification of antimicrobial resistance and more. PLoS ONE 15(12):e0242987. https://doi.org/10.1371/journal.pone.0242987
Abstract¶
Antimicrobial resistance (AMR) in bacteria is a complex subject, why one need to look at this phenomenon from a wider and holistic perspective. The extensive use of the same antimicrobial classes in human and veterinary medicine as well as horticulture is one of the main drivers for the AMR selection. Here, we applied shotgun metagenomics to investigate the AMR epidemiology in several animal species including farm animals, which are often exposed to antimicrobial treatment opposed to an unique set of wild animals that seems not to be subjected to antimicrobial pressure. The comparison of the domestic and wild animals allowed to investigate the possible anthropogenic impact on AMR spread. Inclusion of animals with different feeding behaviors (carnivores, omnivores) enabled to further assess which AMR genes that thrives within the food chain. We tested fecal samples not only of intensively produced chickens, turkeys, and pigs, but also of wild animals such as wild boars, red foxes, and rodents. A multi-directional approach mapping obtained sequences to several databases provided insight into the occurrence of the different AMR genes. The method applied enabled also analysis of other factors that may influence AMR of intestinal microbiome such as diet. Our findings confirmed higher levels of AMR in farm animals than in wildlife. The results also revealed the potential of wildlife in the AMR dissemination. Particularly in red foxes, we found evidence of several AMR genes conferring resistance to critically important antimicrobials like quinolones and cephalosporins. In contrast, the lowest abundance of AMR was observed in rodents originating from natural environment with presumed limited exposure to antimicrobials. Shotgun metagenomics enabled us to demonstrate that discrepancies between AMR profiles found in the intestinal microbiome of various animals probably resulted from the different antimicrobial exposure, habitats, and behavior of the tested animal species.
📝 Learning Objectives ¶
Utilizing BV-BRC bioinformatic resources to learn the basic concepts of a bioinformatic workflow.
Analyzing the quality of metagenomic sequence data.
Performing taxonomic classification of metagenomic sequence data and producing informative figures.
Understanding how to use publically-available databases to classify AMR genes within a metagenome.
Build confidence in our ability to use bioinformatic tools to address real-world questions.
Part 0A: Sign into the BV-BRC and access exercise material ¶
At this point in time, you should have already created an account on the BV-BRC website. If you haven't, please do so now!
You can access all the necessary material through this workspace. You must be signed in first to access the workspace. Please let us know if any issues arise.
Copying the sequences to your own workspace¶
While you can access publicly-available workspaces, you should copy the "Exercise Material" folder over to your own workspace first before beginning your analyses. You can do this by clicking the folder and selecting "Copy" on the sidebar found on the right-hand side of your screen.
Next, you will need to select which folder/workspace you would like as the destination for the copied folder. This destination could be your home workspace, or a workspace you created for this assignment.
If you decide not to choose any folder shown on the Copy menu, the copied folder will automatically be placed in your home workspace.
Part 0B: FASTQ Utilies: Quality control and trimming ¶
Before starting any bioinformatic analysis, it is essential to understand the quality of the FASTQ reads you intend to use. Quality control and read trimming are necessary steps used to remove low-quality reads, adapter sequences, and contaminants prior to downstream analyses. Doing so can ensure the presence of only high-quality sequence data, allowing for more accurate and reliable results.
FASTQ Utilities Service¶
Quality control and trimming of your sequence data can be performed with the FASTQ Utilities Service, which is accessed under the Tools and Services drop-down menu, within the Utilities section.
Parameters¶
Under this section, you can select the Output Folder where you would like the FastQ Utilities Service to upload your results. Under Output Name, you should choose a unique title to help you distinguish the results of this service from other results you will recieve later down the pipeline.
Pipeline¶
Various pipelines can be selected under the drop-down menu. For this exercise, we will use the FastQC program to perform quality checks on raw sequencing data from high throughput sequencing pipelines.
Paired read library¶
Here, you will select the sequence data on which you will be performing quality checks on. Sequence data can be input as either a paired-end library, single-end library, or by using the SRA accession number from NCBI. Once completed, click the arrow button in the right-hand corner to add these files to your Selected Libraries. Hit Submit once complete.
Part 1: Taxonomic Classification Service (TCS) ¶
The BV-BRC taxonomic classification service is a useful tool for exploring the microbial composition of metagenomic samples. With this, we can compare the relative abundance of taxa accounting for at least 1% of the total read hits between the various domestic and wild animal populations within the study. Additionally, the TCS will return the quality control metrics for the raw reads of each sample and provide us insight into the structure of each microbial community based on alpha and beta-diversity metrics.
Input File¶
To use this service, raw FASTQ files must be entered as input files, either as single or paired-end reads, or they can be accessed directly from the NCBI database with an SRA Accession number. Regardless, of the input type, you should choose a sample identifier name that will easily distinguish each sample from one another.
Before a second sample can be added, you will need to click the arrow in the top-right corner to add your input file to the Selected Libraries.
Parameters¶
Adjusting the parameters prior to submitting the job allows us to control how the service should analyze our samples. Since we are working with metagenomic samples. Whole Genome Sequencing should be selected under Sequencing Type. We will perform a Microbiome Analysis and will use the BV-BRC Database as our reference database. Filtering host reads is optional, but it may be useful to filter our Homo sapien reads from your dataset. The Confidence Interval should be set to 0.1, and the Output Folder and Output Name should be selected appropriately. Hit submit once complete.
Job Results¶
The status of your job submmission can be viewed by either clicking the Jobs tab in the bottom-right corner of your screen or by clicking My Jobs under the Workspaces drop-down menu.
Raw read quality scores¶
While the FASTQ utilities service is a tool developed specifically for the processing of raw sequencing data, the quality control metrics for each sample can also be viewed within their respective folder. After accessing the sample folder, you can view the QC reports by selecting the fastqc_results folder and choosing either the host_removed_reads or the raw_reads folder. All job results that can be viewed directly within your web browser will be saved under a .html extension.
Before getting too deep into your analysis, it's important to first ensure the quality of your data is good enough to provide you with reliable and accurate insights into your microbiome sample. To do this, you should view the per base sequence quality graphic for your forward and reverse (e.g., R1 and R2) reads.
This report gives you the average (the blue line), median (the red line), and overall distribution (the yellow box plot) of quality scores (the y-axis) for all of the reads within your files at each nucleotide position (the x-axis). Having a "good" quality score, or being within the "green zone" of values higher than 28, can be interpreted as a nucleotide position having a low probability of representing a sequencing error (a good thing!). It is normal for quality scores to drop slightly near the end of your reads, but as long as the average quality scores remain above 28, there is no need to make any adjustments to your raw reads before continuing your bioinformatic workflow.
💻 Task 1: TCS Data Visualization ¶
While there are a multitude of informative outputs provided by the TCS, the Taxonomic-Classification-Service-BVBRC_multiqc_report.html file contains interactive plots and diagrams used to illustrate the taxonomic composition, quality, and general statistics of your metagenomic samples. For your first task, you should access this document and:
Using the results from the Bracken computational tool, create a stacked bar graph that illustrates the percent abundance (or the relative abundance) of the top 5 phyla for each sample. You can save this graph by using the Export Plot button to download your graph as a PNG file.
For each sample, identify the top bacterial phyla and include its percent abundance.
NOTE: Chordata is not a bacterial phyla!
Part 2: Metagenomic Read Mapping Service (MRMS) ¶
The MRMS is a valuable resource for researchers interested in identifying antimicrobial resistance genes present within metagenomic samples. To do this, the MRMS uses k-mer alignment to align sequence data to reference genes within the Comprehensive Antibiotic Resistance Database (CARD). Not only can this service determine the number of different AMR genes present within a metagenomic sample, but it can also provide insights into the abundance of each gene based on the mapped sequencing depth (e.g., the number of reads that align with a specific AMR reference gene).
Input File¶
Similar to the TCS, raw FASTQ files must be entered as input files, either as single or paired-end reads, or they can be accessed directly from the NCBI database with an SRA Accession number. However, an MRMS job submission should only consist of one sample at a time since this service will survey all metagenomic reads collectively from every sample listed within the selected libraries.
Parameters¶
For the parameters, Predefined List should be indicated under Gene Set Type, and CARD should be chosen as the Predefined Gene Set Name. Same as the previous analysis, Output Folder and Output Name should be selected appropriately. Hit submit once complete.
Job Result¶
To simplify your learning experience using this BV-BRC resource, the original MRMS outputs have been converted to CSV files that can be easily interpreted and viewed directly within the BV-BRC Workspace. The files can be found within the MRM_CSV_FILES folder with the rest of the Exercise Material.
💻 Task 2: Comparing the AMR gene population between wild and domestic animals ¶
Now that we have completed our analysis, we can begin to address our initial question: How does the AMR gene population within the gut microbiome of domestic animals differ from what's found within wild animal populations?
1a) The figure above illustrates the AMR gene richness (A), AMR gene diversity (B), bacterial richness (C), and bacterial diversity (D) of our wild and domestic animal populations. Briefly, describe the difference between species richness and species diversity.
1b) Which population type (domestic or wild) has the higher AMR gene richness? Which population type seems to have the higher AMR gene diversity?
1c) Which population type has a higher bacterial richness? Which one has the higher bacterial diversity?
2a) The figure above illustrates the correlation between AMR gene richness and bacterial richness (A) as well as AMR gene richness and bacterial diversity (B) for wild and domestic animal populations, collectively. Based on these images, we can see a positive correlation between AMR gene richness with both bacterial richness and bacterial diversity.
If we were to view these gut microbiomes as a community of organisms competing with one another for resources, in 2-3 sentences, explain how a higher AMR gene richness may help increase the richness and diversity of a microbiome.
2b) Conversely, how may a higher bacterial richness increase the number of AMR genes present within a microbiome?
NOTE: Feel free to use additional resources, or the original paper, to help you answer these questions. Be sure to include a reference for any resource you reference.
For the final part of this task, you will need to access the CARD outputs provided to you within the MRM_CSV_FILES folder. Within this folder, you will find a csv file for each of the four animal populations, all consisting of lists of every AMR gene identified within their respective metagenomic sample. These lists are ordered by AMR gene abundance, with the first AMR of each list being the most abundant AMR gene within that population.
3a) For each animal, identify their most abundant AMR gene. For each AMR gene, describe which antibiotic it gives resistance to, and provide a brief description on the mechanism of the AMR gene.
3b) From the images above, we see that chickens and foxes have the greatest AMR gene richness for their population type, respectively. For the top AMR gene of each animal, provide possible reasons for why resistance to those antibiotics may have been conferred within each of these populations. To do so, you should take into consideration (1) antibiotic use within agriculture and (2) possible food resources for each of the two animals. Feel free to use the original publication as a resource, or you may reference additional outside sources.
💻 Task 3: Putting your knowledge to use: analyzing the AMR gene population of your own metagenome! ¶
While we were able to successfully replicate the findings of Skarżyńska et al., one question still remains: Is this a universal trend or one that is unique to this study?
To address this, you will need to find a publically-available metagenomic sample from the NCBI database. We will discuss how to find your metagenomic sample during class.
For the first part of your task, you will need to search the SRA database and find one metagenomic sample that uses whole genome sequencing (WGS). You will need to submit a link to the NCBI page of your sample on Blackboard.
Using the SRR accession number for your metagenomic sample, you will run FASTQ Utilities to check the quality of your raw reads. You will need to submit a screenshot of the quality reports for your forward and reverse reads. If the quality of your reads drop below 28, let us know and we will assist you with the next step.
Next, you will analyze the taxonomic composition of your sample using the TCS. You will repeat the same requirements described in Task 1 above.
Lastly, you will analyze the AMR gene population of your metagenome using the MRMS. With your results, you will need to:
a) Count the number of AMR genes present within your sample. Does this value align with either the wild or domestic animal populations of the original study?
b) Which AMR gene is the most abundant within your metagenome? Similar to Task 2, you should identify which antibiotic this gene confers resistance to, and briefly describe the mechanism.
c) Lastly, conduct some additional research and write 2-3 sentences on why this AMR gene is abundant within your particular organism. While conducting your research, it may to help to consider (1) the population type of your organism, (2) if your domestic animal is a house pet or farm animal, and (3) the country of origin, since different countries may have differences in the legislation regarding antibiotic use.
NOTE: In addition to answering this question, please also submit the .tsv file containing your MRMS output data.
Feel free to be create when considering which animal metagenome you would like to use. You may be surprised with what samples are present within the NCBI database. Below, we have provided examples for animals you could consider:
Cat Dog Ferritt Horse Deer Fox Boar Pig Rat Mice Monkey Raccoon Opossum Rabbits Bears Rhino Zebra Elephants Armadillo Sheep Capibara Cheetah Chimpanzee Gorilla Panda Marmoset Kangaroo Red Panda Squirrel
Resources¶
Skarżyńska M, Leekitcharoenphon P, Hendriksen RS, Aarestrup FM, Wasyl D(2020) A metagenomic glimpse into the gut of wildand domestic animals: Quantification of antimicrobial resistance and more. PLoS ONE 15(12):e0242987. https://doi.org/10.1371/journal.pone.0242987
Wattam, A. R., Bowers, N., Brettin, T., Conrad, N., Cucinell, C., Davis, J. J., Dickerman, A. W., Dietrich, E. M., Kenyon, R. W., Machi, D., Mao, C., Nguyen, M., Olson, R. D., Overbeek, R., Parrello, B., Pusch, G. D., Shukla, M., Stevens, R. L., Vonstein, V., & Warren, A. S. (2024). Comparative Genomic Analysis of Bacterial Data in BV-BRC: An Example Exploring Antimicrobial Resistance. In J. C. Setubal, P. F. Stadler, & J. Stoye (Eds.), Comparative Genomics (Vol. 2802, pp. 547–571). Springer US. https://doi.org/10.1007/978-1-0716-3838-5_18
Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Clausen, P.T., F.M. Aarestrup, and O. Lund, Rapid and precise alignment of raw reads against redundant databases with KMA. BMC bioinformatics, 2018. 19(1): p. 307.
Alcock, B.P., et al., CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic acids research, 2020. 48(D1): p. D517-D525.
❗❗Supplemental information: Conducting your NCBI search and interpreting your MRM results ¶
Finding a suitable metagenome on NCBI ¶
As mentioned in your exercise handout, Task 3 will require you to search the NCBI database for a valid metagenomic sample from either a domestic or wild animal to conduct your analysis on. To start, you can access the NCBI website from the link above.
To begin your search, you should select SRA from the drop-down menu and enter your search into the search bar. The NCBI GenBank is a H U G E database, storing over 3.7 billion nucleotide sequences from tens-to-hundreds of thousands of user submissions. To help increase the efficiency of your search, using the right keywords is imperative. Keywords such as "gut", "feces", and "metagenome" will help focus your search efforts when looking for a viable metagenome. For this example, we will be searching for a viable sheep gut metagenome sample.
After beginning your search, you can further filter your results by selecting Illumina under Platform, fastq under File Type, and Genome under Strategy, all of which appear under the additional filter settings shown on the left side of your screen.
Luckily for us, the first option of our filtered search looks to be a viable one 🙌 !
NOTE: Within your search list, it is normal to see multiple submissions with the same name. This is because these samples were all submitted as part of the same project, yet each are still individual samples (designated by each sample having its own SRR accession number). Ideally, you all should perform your analyses on different samples. However, it is OK to choose a unique sample from the same project as someone you enjoy working on your class material with (e.g., your choices can have the same search name, as long as you use different SRR numbers).
But how do we KNOW this is a sample we can use for our analysis? There are three descriptions noted under the Library information that you should confirm before making your selection:
Instrument: Any form of Illumina sequencing
Strategy: WGS
Source: Metagenomic
As long as these three requirements are met, your selection should be suitable for the analysis ahead.
Interpreting your MRM results ¶
Converting your .res file to a .tsv file¶
The final part of Task 3 will require you to analyze your MRMS output to address the questions regarding the presence of AMR genes within your sample. While the results are easy to interpret, it will take a small effort to convert your results into a file that you can view and manipulate easily.
For this example, I will be using MRMS results for a dog gut metagenome. To begin, you will need to download the kma.res file. Unfortunately, the .res file type is not viewable in its current form and will need to be converted to a .tsv first.
You should be able to find this file in your Downloads folder (or whichever folder you designated), which can be viewed in Finder (for Mac users) or File Explorer (for Windows users). You can convert the file extension from .res to .tsv by either right-clicking the file, selecting Rename, and renaming the file with the appropriate extension, or by double-clicking the file to rename it.
Once complete, your file will be viewable in either Microsoft Excel or Google Sheets. Below, I will demonstrate how to organize your file to easily interpret your results within each platform.
Using Excel¶
Begin by opening your .tsv file in Excel. I recommend stretching out the first column containing the template information in order to easily read the names of the AMR genes listed within your file.
Calculating AMR gene richness¶
You should be able to easily determine the number of AMR genes by counting the number of rows within your file. If you scroll down to the last row, you can subtract the last row number by 1 (which excludes the row of column names from your AMR gene count) to obtain this value.
Sorting your file to determine the most abundant AMR gene¶
As mentioned in the exercise handout, we use the Depth column as a proxy for the abundance of each AMR gene identified within our metagenome. To organize your table from highest depth to lowest depth, under the Home tab, click on Sort and Filter and select Custom Sort.
Excel should then select your entire table, and a menu will appear on your screen. Using the Sort By custom selections, you should select Depth for Column, Values for Sort On, and Largest to Smallest for Order. After you click OK, your table will now be reorganized with your most abundant AMR gene listed at the top. Remember that the name of each gene will likely follow the the reference gene identifier number (if you see something else, please let us know!).
Using Google Sheets¶
Once you are in Google Sheets, start be creating a new spreadsheet. Click on the File tab and select Import. Navigate to the Upload tab and find and select the .tsv file. An Import File menu will appear on your screen.
Select Replace spreadsheet under Import location and Tab under Separator type (FYI: tsv = tab-separated values) and hit Import data. Your tsv file should now be viewable on Google Sheets.
Calculating AMR gene richness¶
You should be able to easily determine the number of AMR genes by counting the number of rows within your file. If you scroll down to the last row, you can subtract the last row number by 1 (which excludes the row of column names from your AMR gene count) to obtain this value.
Sorting your file to determine the most abundant AMR gene¶
As mentioned in the exercise handout, we use the Depth column as a proxy for the abundance of each AMR gene identified within our metagenome. To organize your table from highest depth to lowest depth, select the Create a filter option from the on-screen customization options (the third-to-last).
This option should select your entire spreadsheet, and icons will appear next to the titles of each column header. Select the icon next to the Depth column header and choose Sort Z to A. Your table will now be reorganized with your most abundant AMR gene listed at the top. Remember that the name of each gene will likely follow the the reference gene identifier number (if you see something else, please let us know!).